Boosting MapReduce with Network-Aware Task Assignment
نویسندگان
چکیده
Running MapReduce in a shared cluster has become a recent trend to process large-scale data analytics applications while improving the cluster utilization. However, the network sharing among various applications can lead to constrained and heterogeneous network bandwidth available for MapReduce applications. This further increases the severity of network hotspots in racks, and makes existing task assignment policies which focus on the data locality no longer effective. To deal with this issue, this paper develops a model to analyze the relationship between job completion time and the assignment of both map and reduce tasks across racks. We further design a network-aware task assignment strategy to shorten the completion time of MapReduce jobs in shared clusters. It integrates two simple yet effective greedy heuristics that minimize the completion time of map phase and reduce phase, respectively. With large-scale simulations driven by Facebook job traces, we demonstrate that the network-aware strategy can shorten the average completion time of MapReduce jobs, as compared to the state-of-the-art task assignment strategies, yet with an acceptable computational overhead.
منابع مشابه
Network-Aware Task Assignment for MapReduce Applications in Shared Clusters
Running MapReduce applications in shared clusters is becoming increasingly compelling to improve the cluster utilization. However, the network sharing across diverse applications can make the network bandwidth for MapReduce applications constrained and heterogeneous, which inevitably increases the severity of network hotspots in racks, and makes the existing task assignment policies that focus ...
متن کاملAvailability and Network-Aware MapReduce Task Scheduling over the Internet
MapReduce offers an ease-of-use programming paradigm for processing large datasets. In our previous work, we have designed a MapReduce framework called BitDew-MapReduce for desktop grid and volunteer computing environment, that allows nonexpert users to run data-intensive MapReduce jobs on top of volunteer resources over the Internet. However, network distance and resource availability have gre...
متن کاملAdaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments
Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...
متن کاملA Novel Protection Guaranteed, Quality of Transmission Aware Routing and Wavelength Assignment Algorithm for All-optical Networks
Transparent All Optical Networks carry huge traffic and any link failure can cause the loss of gigabits of data; hence protection and its guarantee becomes necessary at the time of failure. Many protection schemes were presented in the literature, but none of them speaks about protection guarantee. Also, in all optical networks, due to absence of regeneration capabilities, the physical layer i...
متن کاملOn Task Assignment in Data Intensive Scalable Computing
MapReduce and other Data-Intensive Scalable Computing paradigms have emerged as the most popular solution for processing massive data sets, a crucial task in surviving the “Data Deluge”. Recent works have shown that maintaining data locality is paramount to achieve high performance in such paradigms. To this end, suitable task assignment algorithms are needed. Current solutions use round-robin ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013